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Abstract. We study a clean machine model for external memory and 
stream processing. We show that the number of scans of the external data 
induces a strict hierarchy (as long as work space is sufficiently small, e.g., 
polylogarithmic in the size of the input). We also show that neither joins 
nor sorting are feasible if the product of the number r(n) of scans of 
the external memory and the size s(n) of the internal memory buffers is 
sufficiently small, e.g., of size o{y/n). We also establish tight bounds for 
the complexity of XPath evaluation and filtering. 



1 Introduction 

It is generally assumed that databases have to reside in external, inexpensive 
storage because of their sheer size. Current technology for external storage sys- 
tems (disks and tapes) presents us with a reality that performance- wise, a small 
number of sequential scans of the data is strictly preferable over random data 
accesses. Indeed, the combined latencies and access times of moving to a certain 
position in external storage are by orders of magnitude greater than actually 
reading a small amount of data once the read head has been placed on its start- 
ing position. 

Database engines rely on main memory buffers for assuring acceptable per- 
formance. These are usually small compared to the size of the externally stored 
data. Database technology - in particular query processing technology - has de- 
veloped around this notion of memory hierarchies with layers of greatly varying 
sizes and access times. There has been a wealth of research on query process- 
ing and optimization along these lines (cf. e.g. [27,14,32,22]). It seems that the 
current technologies scale up to current user expectations, but on closer investi- 
gation it may appear that our theoretical understanding of the problems involved 
- and of optimal algorithms for these problems - is not quite as developed. 

Recently, data stream processing has become an object of study by the data 
management community (e.g. [15]) but from the viewpoint of database theory, 
this is, in fact, a special case of the query processing problem on data in external 
storage where we are limited to a single scan of the input data. 

In summary, it appears that there are a variety of data management and 
query processing problems in which a comparably small but efficiently accessi- 
ble main memory buffer is available and where accessing external data is costly 
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and is best performed by sequential read/ write scans. This calls for an appro- 
priate formal model that captures the essence of external memory and stream 
processing. In this paper, we study such a model, which employs a Turing ma- 
chine with one external memory tape (external tape for short) and a number 
internal memory tapes (internal tapes for short). The external tape initially 
holds the input; the internal tapes correspond to the main memory buffers of a 
database management system and are thus usually small compared to the input. 

As computational resources for inputs of size n, we study the space s(n) avail- 
able on the internal tapes and the number r(n) of scans of (or, random accesses 
to) the external tape, and we write ST(r, s) to denote the class of all problems 
solvable by (r, s)-bounded Turing machines, i.e., Turing machines which comply 
to the resource bounds r(n) and s(n) on inputs of size n. 

Formally, we model the number of scans, respectively the number of random 
accesses, by the number of reversals of the Turing machine's read/write head on 
the external tape. The number of reversals of the read/write head on the internal 
tapes remains unbounded. The reversals done by a read/write head are a clean 
and fundamental notion [8] , but of course real external storage technology based 
on disks does not allow to reverse their direction of rotation. On the other hand, 
we can of course simulate k forward scans by 2k reversals in our machine model 
- and allowing for forward as well as backward scans makes our lower bound 
results even stronger. 

As we allow the external tape to be both read and written to, the external 
tape can be viewed, for example, as modeling a hard disk. By closely watching 
reversals of the external tape head, anything close to random I/O will result 
in a very considerable number of reversals, while a full sequential scan of the 
external data can be effected cheaply. We will obtain strong lower bounds in 
this paper that show that even if the external tape (whose size we do not put a 
bound on) may be written to and re-read, certain bounds cannot be improved 
upon. For our matching upper bounds, we will usually not write to the external 
tape. Whenever one of our results requires writing to the external tape, we will 
explicitly indicate this. 

The model is similar in spirit to the frameworks used in [18, 19], but differs 
from the previously considered reversal complexity framework [8] . Reversal com- 
plexity is based on Turing machines with a single read/ write tape and the overall 
number of reversals of the read/write head the main computational resource. In 
our notion, only the number of reversals on the external tape is bounded, while 
reversals on the internal tapes are free; however, the space on the internal tapes 
is considered to be a limited resource. 3 



3 The justification for this assumption is simply that accessing data on disks is cur- 
rently about five to six orders of magnitude slower than accessing main memory. 
For that reason, processor cycles and main memory access times are often neglected 
when estimating query cost in relational query optimizers, where cost measures are 
often exclusively based on the amount of expected page I/O as well as disk latency 
and access times. Moreover, by taking buffer space rather than running time as a 
parameter, we obtain more robust complexity classes that rely less on details of the 
machine model (see also [31]). 
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Apart from formalizing the ST(r, s) model, we study its properties and locate 
a number of data management problems in the hierarchy of ST(-, •) classes. 
Our technical contributions are as follows: 

• We prove a reduction lemma (Lemma 4.1) which allows easy lower bound 
proofs for certain problems. 

• We prove a hierarchy (Corollary 4.11 and Theorem 4.10), stating for each 
fixed number k that k + 1 scans of the external memory tape are strictly more 
powerful than k scans of the external memory tape. 

• We consider machines where the product of the number of scans of the external 
memory tape, r(n), and internal memory tape size, s(n), is of size o( 10 g ra ), 
where n is the input size, and show that joins cannot be computed by (r, s)- 
bounded Turing machines (cf., Lemma 4.4). 

• We show that the sorting problem cannot be solved with (o(-^n), 0(-\fn))- 
bounded Turing machines that are not allowed to write intermediate results 
to the external memory tape (cf., Corollary 4.9). 

• We show (cf., Theorem 4.5) that for some XQuery queries, filtering is impos- 
sible for machines with r(T) ■ s{T) e o( 53^), where n is the size of the input 
XML document T. 

• We show (cf., Corollary 5.5) that for some Core XPath [12] queries, filtering is 
impossible for machines with r(T) ■ s(T) e o(d), where d denotes the depth of 
the input XML document T. Furthermore, we show that the lower bound on 
Core XPath is tight in that we give an algorithm that solves the Core XPath 
filtering problem with a single scan of the external data (zero reversals) and 
0(d) buffer space. 

The primary technical machinery that we use for obtaining lower bounds 
is that of communication complexity (cf. [21]). Techniques from communication 
complexity have been used previously to study queries on streams [4, 6, 2, 3, 5, 23, 
24, 18]. The work reported on in [4] addresses the problem of determining whether 
a given relational query can be evaluated scalably on a data stream or not at 
all. In comparison, we ask for tight bounds on query evaluation problems, i.e. 
we give algorithms for query evaluation that are in a sense worst-case optimal. 
As we do, the authors of [6] study XPath evaluation; however, they focus on 
instance data complexity while we study worst-case bounds. This allows us to 
find strong and tight bounds for a greater variety of query evaluation problems. 
Many of our results apply beyond stream processing in a narrow sense to a more 
general framework of queries on data in external storage. Also, our worst-case 
bounds apply for any evaluation algorithm possible, that is, our bounds are not 
in terms of complexity classes closed under reductions that allow for nonlinear 
expansions of the input (such as LOGSPACE) as is the case for the work on the 
complexity of XPath in [12, 13, 28]. 

Lower bound results for a machine model with multiple external memory 
tapes (or harddisks) are presented in [17]. In the present paper, we only consider 
a single external memory tape, and are consequently able to show (sometimes 
exponentially) stronger lower bounds. 

The present paper is the full version of the conference contribution [16]. 
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2 Preliminaries 

In this section we fix some basic notation concerning trees, streams, and query 
languages. We write N for the set of non-negative integers. If M is a set, then 
2 M denotes the set of all subsets of M. Throughout this paper we make the 
following convention: Whenever the letters r and s denote functions from N to 
N, then these functions are monotone, i.e., we have r(x) ^ r(y) and s(x) ^ s(y) 
for all x, y e N with x < y. 

Trees and Streams. We use standard notation for trees and streamed trees 
(i.e. documents). In particular, we write Doc(T) to denote the XML document 
associated with an XML document tree T. An example is given in Figure 1. 
Some more details on trees and streams can be found in Appendix A. 

Query Languages. By Eval(-, •) we denote the evaluation function that maps 
each tuple (Q,T), consisting of a query Q and a tree T to the corresponding query 
result. Let Q be a query language and let 71 C Trees T and T2 C T\. We say that 
T2 can be filtered from T\ by a Q-query if, and only if, there is a query Q £ Q 
such that the following is true for all T e T\: T G T 2 ■<==> Eval(Q, T) ^ 0. 

We assume that the reader is familiar with first-order logic (FO) and monadic 
second-order logic (MSO). An FO- or MSO-sentence (i.e., a formula without any 
free variable) specifies a Boolean query, whereas a formula with exactly one free 
first-order variable specifies a unary query, i.e., a query which selects a set of 
nodes from the underlying input tree. 

It is well-known [9, 30] that the MSO-dcfinable Boolean queries on binary 
trees are exactly the (Boolean) queries that can be defined by finite (deterministic 
or nondetcrministic) bottom-up tree automata. An analogous statement is true 
about MSO on unranked trees and unranked tree automata [7]. 

Theorem 4.5 in section 4 gives a lower bound on the worst case complexity 
of the language XQuery. As we prove a lower bound for one particular XQuery 
query, we do not give a formal definition of the language but refer to [33] . 

Apart from FO, MSO, and XQuery, we also consider a fragment of the XPath 
language, Core XPath [12, 13]. As we will prove not only lower, but also upper 
bounds for Core XPath, we give a precise definition of this query language in 
Appendix B. An example of a Core XPath query is 



which selects all children of descendants of the root node that (i.e., the descen- 
dants) have a child node labeled A and a child node labeled B. 

Core XPath is a strict fragment of XPath [12], both syntactically and seman- 
tically. It is known that Core XPath is in LOGSPACE w.r.t. data complexity 
and P-complete w.r.t. combined complexity [13]. In [12], it is shown that Core 
XPath can be evaluated in time 0(|<2| ■ where \Q\ is the size of the query 
and \D\ is the size of the XML data. Furthermore, every Core XPath query is 
equivalent to a unary MSO query on trees (cf., e.g., [11]). 



/descendant::* [child::A and child: :B] /child::*, 
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Communication complexity. To prove basic properties and lower bounds 
for our machine model, we use some notions and results from communication 
complexity, cf., e.g., [21]. 

Let A, B, C be sets and let F : A x B —* C be a function. In Yao's [34] basic 
model of communication two players, Alice and Bob, jointly want to evaluate 
F(x,y), for input values x G A and y G B, where Alice only knows x and Bob 
only knows y. The two players can exchange messages according to some fixed 
protocol V that depends on F, but not on the particular input values x, y. The 
exchange of messages starts with Alice sending a message to Bob and ends as 
soon as one of the players has enough information on x and y to compute F(x, y). 

V is called a k-round protocol, for some k G N, if the exchange of messages 
consists, for each input (x, y) G A x B, of at most k rounds. The cost of V on 
input (x,y) is the number of bits communicated by V on input (x,y). The cost 
ofV is the maximal cost of V over all inputs (x, y) G Ax B. The communication 
complexity of F, comm-compl(F), is defined as the minimum cost of V, over all 
protocols V that compute F . For k ^ 1, the k-round communication complexity 
of F, comm-compl k {F), is defined as the minimum cost of V, over all fc-round 
protocols V that compute F. 

Many powerful tools are known for proving lower bounds on communication 
complexity, cf., e.g., [21]. In the present paper we will use the following basic 
lower bound for the problem of deciding whether two sets are disjoint. 

Definition 2.1. For n G N let the function Disj n : 2* 1 '' • > n > x 2< 1 " ■ <"> -> {0, 1} 



Theorem 2.2 (cf., e.g., [21]). For every neN, comm-compl(Disj n ) ^ n. 



We consider Turing machines with (1) an input tape, which is a read/write tape 
and will henceforth be called "external memory tape" or "external tape", for 
short, (2) an arbitrary number u of work tapes, which will henceforth be called 
"internal memory tapes" or "internal tapes", for short, and, if needed, (3) an 
additional write-only output tape. 

Let M be such a Turing machine and let p be a run of M. By rev(p) we denote 
the number of times the external memory tape's head changes its direction in 
the run p. For i G {1, . . , u} we let space(p, i) be the number of cells of internal 
memory tape i that are used by p. 

The class ST(r, s) for strings. 

Definition 3.1 (ST(r, s) for strings). Let r : N -> N and s : N -> N. 

(a) A Turing machine M is (r, s)-bounded, if every run p of M on an input of 
length n satisfies the following conditions: 



be given via 




□ 



3 Machine Model 
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(1) p is finite, (2) 1 + rev(p) < r(n), 4 and (3) Y^t=i space(p, i) < s(n), 
where u is the number of internal tapes of M. 

(b) A string-language LCT belongs to the class ST(r, s) (resp., NST(r, s)), if 
there is a deterministic (respectively, nondeterministic) (r, s)-bounded Tur- 
ing machine which accepts exactly those w £ S* that belong to L. 

(c) A function / : S* — > 17* belongs to the class ST(r, s), if there is a determin- 
istic (r, s)-bounded Turing machine which produces, for each input string 
w £ S* , the string /(«;) on its write-only output tape. □ 

For classes R and S of functions, we let ST(R, S) := [J reR seS ST(r, s). 
If k £ N is a constant, then we write ST(k, s) instead of ST{r, s), where r is the 
function with r(x) = k for all x £ N. We freely combine these notations and use 
them for NST(-, •) instead of ST(-, •), too. 

If we think of the external memory tape of an (r, s)-boundcd Turing machine 
as representing the incoming stream, stored on a hard disk, then admitting the 
external memory tape's head to reverse its direction might not be very realistic. 
But as we mainly use our model to prove lower bounds, it does not do any harm 
either, since the reversals can be used to simulate random access. Random access 
can be introduced explicitly into our model as follows: A random access Turing 
machine is a Turing machine M which has a special internal memory tape that 
is used as random access address tape, i.e., on which only binary strings can be 
written. Such a binary string is interpreted as a positive integer specifying an 
external memory address, that is, the position index number of a cell on the 
external tape (we think of the external tape cells being numbered by positive 
integers). The machine has a special state q ra . If q ra is entered, then in one 
step the external memory tape head is moved to the cell that is specified by 
the number on the random access address tape, and the content of the random 
access address tape is deleted. 

Definition 3.2. Let q,r, s : N — > N. A random access Turing machine M is 
(q,r, s) -bounded, if it is (r, s)-boundcd (in the sense of an ordinary Turing ma- 
chine) and, in addition, every run p of M on an input of length n involves at 
most q(n) random accesses. □ 

Noting that a random access can be simulated with at most 2 changes of the 
direction of the external memory tape head, one immediately obtains: 

Lemma 3.3. Let q,r,s:N^>N.Ifa problem can be solved by a (q, r, s) -bounded 
random access Turing machine, then it can also be solved by an (r + 2q,0{s))- 
bounded Turing machine. 

In the subsequent parts of this paper, we will concentrate on ordinary Turing 
machines (without random access). Via Lemma 3.3, all results can be transferred 
from ordinary Turing machines to random access Turing machines. 

4 It is convenient for technical reasons to add 1 to the number rev(p) of changes of 
the head direction. As defined here, r(n) bounds the number of sequential scans of 
the external memory tape rather than the number of changes of head directions. 
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The class ST(r, s) for trees. We make an analogous definition to ST(r, s) on 
strings for trees. This definition is given in detail in Appendix C. 

4 Lower bounds for the ST model 

A reduction lemma. The following lemma provides a convenient tool for 
showing that a problem L does not belong to ST(r, s). The lemma's assumption 
can be viewed as a reduction from the problem Disj n (-, •) to the problem L. 

Lemma 4.1. Let S be an alphabet and let A : N — > N such that the following is 
true: For every no £ N there is an n ^ no and functions /„, g n : 2^'' ■ >"} — ► S* 
such that for all I,7C {1, . . , n} i/ie string f n (X)g n (Y) has length < A(n). 

TTien we /icwe /or a// r, s : N — > N with r(A(n)) • s(A(n)) € o(n), i/i<rf iftere 
is no (r, s)-bounded deterministic Turing machine which accepts a string of the 
form f n (X)g n (Y) if, and only if, X C\Y = 0. 

The proof of this lemma can be found in Appendix D. 

Disjointness. Every n-bit string X — X \ ' ' ' X yi E {0, 1}" specifies a set S(x) := 
{i : Xi = 1} C {1, . . , n}. Let Ljjisj consist of those strings x#j/ where x and y 
specify disjoint subsets of {1, . . , n}, for some n > 1. That is, 

L DlsJ := { : ex. n ^ 1 with x,y G {0, 1}™ and S(x) n = }. 

From Lemma 4.1 one easily obtains 

Proposition 4.2. Let r : N — » N and s : N — > N. If r(n) ■ s(n) e o(n), tften 
£d, sj ^ iST^r, s). 

The proof can be found in Appendix E. The bound given by Proposition 4.2 is 
tight, as it can be easily seen that Loisj € ST(r,s) for all r, s : N — > N with 
r(n) ■ s(n) G ]7(n). 

Joins. Let t be the set of tag names { rels, rell, rel2, tuple, nol, no2, 0, 1 } . 
We represent a pair (A, B) of finite relations A, B C N 2 as a r-trec T(A, B) 
whose associated XML document Doc(T(A, B)) is a i7 T -striiig of the following 
form: For each number i £ N let Bin(i) = b^ ■ ■ ■ b^ be the binary representation 
of i. For each tuple G {1, . . , n} 2 let Doc(i,j) := 

(tuple) (nol) (6g)/) • • • (ftW/) (/nol) (no2) <ftg>/> • • • (&«>/> (/no2) (/tuple) . 

For each finite relation A C N 2 let ti, . . ,t\A\ be the lexicographically ordered 
list of all tuples in A. We let Doc(A) := Doc{t\) ■ ■ ■ Doc{t\ A \) . Finally, we let 

Doc(T(A,B)) := (rels) (rell) Doc{A) (/rell) (rel2> Doc(B) (/rel2) (/rels). 

It is straightforward to see that the string Doc(T(A,B)) has length 0((| A| + 
\B\) -logn), if A,BC {!,..., n} 2 . 
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We write A xii B to denote the join of A and B on their first component, 
i.e., A Mi B := { (x, y) : 3z A(z, x) A B(z, y) } . We let 

TrcIs := { T(A, B) : A,B C N 2 , A,B finite} 

T Emp t y join ■= { T(A, B) G T Rets : A Mi B = } 

Tn onEmpty J oin := { T(A, B) e Trms ■ A txii B / } . 

Lemma 4.3. T NonEmptv j oin can be filtered from T Rets by an XQuery query. 

Lemma 4.4. Let r, s : Trees T — > N. 
// r(T) • s(T) e o( 

log7size^T)))' 1~ Emp tyJoin & ST(r,s). 

The proofs of these two lemmas can be found in Appendix F and G, respectively. 
From Lemma 4.4 and Lemma 4.3 we immediately obtain a lower bound on the 
worst-case data complexity for filtering relative to an XQuery query: 

Theorem 4.5. The tree-language TEmptyjoin 

(a) can be filtered from TrcIs by an XQuery query, 

(b) does not belong to the class ST(r,s), whenever r,s : Trees r — > N with 

r(T)-s(T) e o( E ^ J ). 

Remark 4-6- Let us note that the above bound is "almost tight" in the follow- 
ing sense: The problem of deciding whether A Mi B = and, in general, all 
FO-definable problems belong to ST(l,n) - in its single scan of the external 
memory tape, the Turing machine simply copies the entire input on one of its 
internal memory tapes and then evaluates the FO-sentence by the straightfor- 
ward LOGSPACE algorithm for FO-model-checking (cf. e.g. [1]). □ 

Sorting. By KeySort, we denote the problem of sorting a set S of tuples 
t = (K, V) consisting of a key K and a value V by their keys. Let ST~(r,s) 
denote the class of all problems in ST(r, s) that can be solved without writing 
to the external memory tape. Then, 

Theorem 4.7. Let r, s : N — > N. If KeySort is in ST~(r,s), then computing 
the natural join A cxi B of two finite relations A, B is in 

ST- (r(n 2 ) + 2, s(n 2 ) + 0(log n) + 0(max ieAui3 |t|)) . 

A proof is given in Appendix H. 

Remark 4-8. Given that the size of relations A and B is known (which is usually 
the case in practical database management systems DBMS), the algorithm given 
in the previous proof can do a merge-join without additional scans after the 
sort run and without a need to buffer more than one tuple. This is guaranteed 
even if both relations may contain many tuples with the same join key - in 
current implementations of the merge join in DBMS, this may lead to grass- 
roots swapping. The (substantial) practical drawback of the join algorithm of 
the proof of Theorem 4.7, however, is that much larger relations A' , B' need to 
be sorted: indeed \A'\ = \A\ * \B\. □ 
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Corollary 4.9. 

(a) Let r, s : N — > N such that r(n 2 ) ■ (s(n 2 ) + logn) e o(j^) . 

TTien, KeySort £ 57^ (r, s). 
f&J KeySort g S7^(o(^),<9(^)). 

The proof is given in Appendix I. 

It is straightforward to see that by using MergeSort, the sorting problem can 
be solved using O(logn) scans of external memory provided that three external 
memory tapes are available. (In [17], this logarithmic bound is shown to be 
tight, for arbitrarily many external tapes.) Corollary 4.9 gives an exponentially 
stronger lower bound for the case of a single external tape. 

A hierarchy based on the number of scans. 
Theorem 4.10. For every fixed k ^ 1, 

ST(k, 0((logfc)+logn)) n NST(1, O(fc-logn)) % ST(k-l,o( kS( £ n)3 )). 

The proof of this theorem, which can be found in Appendix J, is based on a 
result due to Duris, Galil and Schnitger [10]. Theorem 4.10 directly implies 

Corollary 4.11. For every fixed fcgN and all classes S of functions from N to 
N such that O(logn) C S C o(j0p) we have ST(k,S) § ST(k+l,S). 

Remark 4-12. On the other hand, of course, the hierarchy collapses if internal 
memory space is at least linear in the size of the input: For every r : N — > N and 
for every s : N — > N with s(n) £ ft(n), we have 

ST(r, s) C ST{l,n + s(n)) and ST(r, 0(s(n))) = DSPACE(0(s(n))). 

5 Tight bounds for filtering and query evaluation on trees 

Lower bound. We need the following notation: We fix a set t of tag names via 
r := { root, left, right, blank } . Let T\ be the r-tree from Figure 1. Note that 
T\ has a unique leaf v\ labeled with the tag name "left". For any arbitrary r- 
tree T we let T\ (T) be the r-tree rooted at T\ 's root and obtained by identifying 
node vi with the root of T and giving the label "left" to this node. Now, for 
every n ^ 2 let T n be the r-tree inductively defined via T n :— Ti(T„_i). It 
is straightforward to see that T n has exactly 2n leaves labeled "blank". Let 
xi, . . , x n , y n , . . , yi denote these leaves, listed in document order (i.e., in the 
order obtained by a pre-order depth- first left-to-right traversal of T n ). For an 
illustration see Figure 2. 

We let r i := rU{0,l}. For all sets X,Y C {1,.. ,n} let T n (X,Y) be the r i- 
tree obtained from T n by replacing, for each i S {1, . . , n}, (*) the label "blank" 
of leaf Xi by the label 1 if i e X, and by the label otherwise and (*) the label 
"blank" of leaf yi by the label 1 if i £ Y, and by the label otherwise. 
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We let 

Tsets := { T n (X,Y) : n>l, X,Y C {1, . . ,n}} , 

r DjsJ := { T n (x, y) e r Sets : x n y = } , 
TjvonDutf := { t„(x, y) g r Sefs : x n y ^ } . 

Lemma 5.1. (a,) There is a Core XPath query Q such that the following is true 

for all r-trees T e T Sets : Eval(Q,T) ^ T e T NonDisj . 

(b) There is a FO-sentence ip such that the following is true for all r-trees T: 

T |= <fi T G TffonDisj- 

A proof of this lemma can be found in Appendix K. 

Lemma 5.2. Let r, s : Trees T — > N. 

Ifr(T) ■ s(T) e o(depth(T)), then T NonDls:j ST(r,s). 

The proof is similar to the proof of Lemma 4.4. It is given in Appendix L. 

From Lemma 5.1 and Lemma 5.2 we directly obtain a lower bound on the 
worst-case data complexity of Core XPath filtering: 

Theorem 5.3. The tree-language TnonDisi 

(a) can be filtered from Tsets by a Core XPath query, 

(b) is definable by an FO-sentence (and therefore, also definable by a Boolean 
MSO query and recognizable by a tree automaton) , and 

(c) does not belong to the class ST(r,s), whenever 
r, s : Trees r — » N with r{T) ■ s{T) e o(depth(T)). 

In the following subsection we match this lower bound with the corresponding 
upper bound. 

Upper bounds. Recall that a tree-language T C Trees T is definable by an 
MSO-sentence if, and only if, it is recognizable by an unranked tree automaton, 
respectively, if, and only if, the language {BinTree(T) : T e T} of associated 
binary trees is recognizable by an ordinary (ranked) tree automaton (cf., e.g., 
[7,9,30]). 

Theorem 5.4 (implicit in [25,29]). Let T C Trees T be a tree-language. IfT 
is definable by an MSO-sentence (or, equivalently, recognizable by a ranked or 
an unranked finite tree automaton), then T e ST(l, depth(-) + l). 
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A direct proof can be found in Appendix M. 

Recall that every Core XPath query is equivalent to a unary MSO query. 
Thus a Core XPath filter can be phrased as an MSO sentence on trees. From the 
Theorems 5.4 and 5.3 we therefore immediately obtain a tight bound for Core 
XPath filtering: 

Corollary 5.5. (a) Filtering from the set of unranked trees with respect to every 
fixed Core XPath query Q belongs to ST(l,0(depth(-))) . 

(b) There is a Core XPath query Q such that, for all r, s : Trees T — > N with 
r(T) ■ s(T) G o(depth(Tf) , filtering w.r.t. Q does not belong to ST(r,s). 

Next, we provide an upper bound for the problem of computing the set 
Eval(Q, T) of nodes in an input tree T matching a unary MSO (or Core XPath) 
query Q. We first need to clarify what this means, because writing the subtree 
of each matching node onto the output tape requires a very large amount of 
internal memory (or a large number of head reversals on the external memory 
tape), and this gives us no appropriate characterization of the difficulty of the 
problem. Wc study the problem of computing, for each node matched by Q, its 
index in the tree, in the order in which they appear in the document Doc(T) . We 
distinguish between the case where these indexes are to be written to the output 
tape in ascending order and the case where they are to be output in descending 
(i.e., reverse) order. 

Theorem 5.6 (implicit in [26,20]). For every unary MSO or Core XPath 
query Q, the problem of computing, for input trees T, the nodes in Eval(Q, T) 

(a) in ascending order belongs to ST(3,0(depth(-)j). 

(b) in reverse order belongs to ST(2,0 (depth (•)))• 

A proof is given in Appendix N. 

Note that this bound is tight: From Corollary 5.5(c) we know that, for some 
Core XPath query Q, not even filtering (i.e., checking whether Eval(Q,T) is 
empty) is possible in ST(r, s) if r{T) ■ s(T) e o(depth(T)). 
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APPENDIX 

A Definitions of Trees and Streams 

Let r be a finite set. We will use r as a set of tag names. We associate with 
t a finite alphabet S T as follows: For each symbol a G t, the alphabet S T 
contains (i) a symbol (a) (corresponding to the opening tag labeled a), (ii) a 
symbol (/a) (corresponding to the closing tag labeled a), and (iii) a symbol (a/) 
(corresponding to the bachelor tag labeled a). 

Binary T-trees are finite labeled ordered trees where each node has at most 
2 children and is labeled with a symbol (i.e., tag name) in r. 

Unranked t -trees are finite labeled ordered trees where each node may have 
an arbitrary number of children and is labeled with a symbol in r. We use 
Trees T to denote the set of all unranked r-trecs. An unranked r-tree T can be 
represented by a binary tree BinTree(T) in a straightforward way by using the 
first-child / next-sibling notation (cf., e.g., [11]). 

The XML document Doc(T) corresponding to an unranked r-tree T can 
be viewed as a string over the alphabet S T , cf. Figure 1 in the appendix. In 
particular, reading the string Doc(T) from left to right corresponds to a depth- 
first left-to-right traversal of the tree T. For a set T of T-trees we write Doc{T) 
for the string language Doc(T) := { Doc(T) : T <E T} C £*. We use 
size(T) to denote the number of nodes in T, and we use depth(T) to denote the 
maximum number of edges on a path from the root to one of T"s leaves. 



B Definition of the Query Language Core XPath 

XPath uses thirteen binary relations - called axes - for navigating in trees. We 
only introduce four of them in this paper, Child (the intuitive child relation; 
Child(v,w) iff w is a child of v), Parent (its inverse), Descendant (the transitive 
closure of Child), and Ancestor (its inverse). In the following definition of Core 
XPath, we assume all 13 axes to be supported. (For a complete formal definition 
of (Core) XPath see [12].) 

Definition B.l. Let T be a tree. We define the syntax of Core XPath by the 
EBNF 

corexpath: locationpath | '/' locationpath 
locationpath: locationstep ('/' locationstep)* 
locationstep: \ '::' P \ \ '::' P '[' pred ']' 
pred: pred 'and' pred | pred 'or' pred 

| 'notf pred ')' | corexpath 

| '(' pred ')' 

where "corexpath" is the start production, \ stands for an axis, and P for a 
"node test", that is, a tag name from r or "*", meaning "any node" V T . We 
write £ (corexpath) to denote the language defined by the above EBNF for the 
symbol corexpath. 
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The semantics of Core XPath queries on trees T is denned by two functions 
iS and £ (for Core XPath expressions and condition predicates, respectively): 



S : £(corexpath) -> 2 V xV 
S[X"P[e]] := {(x, y) | X T (x, y) A 

label T (y) =P A y G £ [e]} 
5[/tt] := F T x {x | (roof,x) G S[7r]} 
5[7ri/7r 2 ] := {<£, z)\3y: (x, y) G 5[tti] A 

(!/,*> g5[tt 2 ]} 



£[ei and e 2 ] 
£[ei or e 2 ] 
£[not(e)] 



£{pred) -» 2 V 
flei]nf[e 2 ] 
£[ei]U£[e 2 ] 
F T - f [e] 

{x | 3x : (x ,x) G 5[tt]} 



Here, 7r, 7ri and 7r 2 are location paths. Query Q results in the set 
Eval(Q,T) :={y\3x:(x,y)eSlQ}}. 



□ 



C Definition of the class ST(r, s) for trees 

Let r be a set of tag names. Recall from section 2 that Trees T denotes the set of 
all unranked T-trees. 

Definition C.l (ST(r,s) for trees). Let r : Trees T — > N and s : Trees T -> N. 

(a) A Turing machine M is (r, s)-bounded, if every run p of M on an input string 
Doc(T), for all T G Trees T satisfies the following conditions: 

• p is finite, 

• rev( j o) < r(T), 

• Ym=i space(p, i) < s(T), where u is the number of internal tapes of M. 

(b) A tree- language T C Trees T belongs to the class ST(r, s), if there is a deter- 
ministic (r, s)-bounded Turing machine M such that, for all T G Trees T , we 
have T G T if, and only if, M accepts the string Doc(T). □ 



D Proof of Lemma 4.1 

Proof of Lemma J^.l: 

For the sake of contradiction let us assume that there is an (r, s)-bounded Turing 
machine M which accepts a string of the form f n (X)g n (Y) if, and only if, XC\Y = 
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0. Since M is (r, s)-bounded, on an input string of length N, M's internal memory 
tapes always have length ^ s(N), and the external memory tape head can pass 
any particular external memory tape position p for at most r(N) times. 

Let no ^ 1 and let n ^ n be chosen according the lemma's assumption. Let 
X and Y be arbitrary subsets of {1, . . , n}. We know that the string f n (X)g n (Y) 
has length A(n) and that f n (X)g n (Y) e L if, and only if, X n Y = 0. 

In particular, any internal memory tape configuration during a run of M on 
f n (X)g n (Y) can be represented by a bit-string of length d-s(X(n)), for a suitable 
constant d. 

Let Q denote M's set of states. Using M, one obtains a communication 
protocol V n that computes the disjointness function Disj n (-, •) as follows: Alice's 
input set X C {1, . . , n} is represented by the string f n (X), whereas Bob's input 
set Y is represented by the string g n (Y). Let p := Alice starts the 

protocol by starting the Turing machine M on input a f n (X) • • •" and letting it 
run until the first time M tries to access the external memory tape position p+1. 
Then she sends the current state and internal memory tape configuration of M 
to Bob. That is, she sends ( log \Q\+d-s(X(n))) bits of information. Now, Bob has 
all the information needed to continue the execution of M on input "• • • g n (Y)" 
until the first time M tries to access the external memory tape position p. Then, 
Bob sends the current state and internal memory tape configuration of M to 
Alice. Alice and Bob continue in this manner until the Turing machine M stops, 
deciding whether or not f n (X)g n (Y) belongs to L and hence providing one of 
the players with the desired information whether or not X n Y = 0. 

Since M passes the external memory tape position p for at most r(A(n)) 
times, the above protocol V n computes the function Disj n (-, •) by exchanging at 
most 

r(A(n))- (log\Q\+d-s(X(n))) 

bits of information. However, since r(A(n)) • s(A(n)) € o(n), we can find an 
n G N such that, for every n 0} 

r(A(n)) • (log|Q| +d-s(X(n))) < n. 

Then, the above protocol V n computes Disj n (-, •) with less than n bits of com- 
munication, contradicting Theorem 2.2. □ 



E Proof of Proposition 4.2 

Proof of Proposition 4-2: 

For every neNwe choose functions /„, g n : 2't 1 ' - ■ — » {0, 1, #}* as follows: For 
every X C {1, .., n} let f n (X) := x# and g n (X) := x, where 
{0, 1}™ is the (unique) n-bit string with S(x) = X. 

Then, for all n e N and all X, Y C {1, . . , n} we have 

/„PO ffn (y) e £d, sj ^ a n y = , 

and \f n (X)g n (Y)\ = 2n + l ^ 3n. 

Assuming that r(n) ■ s(n) e o(n), we obtain from Lemma 4.1 that Loisj ^ 
5T(r,s). □ 
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F Proof of Lemma 4.3 

Proof of Lemma 4-3: 

We can choose the XQuery query Q := 

for $x in /rels/rell/tuple/nol 
for $y in /rels/rel2/tuple/nol 

where deep-equal ($x,$y) return <tuple/> 

It is straightforward to see that for all finite A,BC N 2 , the result Eval(Q, T(A, B)) 
of Q on the tree T(A, B) returns one "tuple"-node for each tuple in A txii B. In 
particular, Eval(Q, T(A, B)) is empty if, and only if, A cx^ B = 0. □ 

G Proof of Lemma 4.4 

Proof of Lemma 4-4'- 

We use Lemma 4.1. For finite X, Y C N let 

A x :={(M) 

B Y :={(i,2) : ieY}. 

Obviously, A x t^i B Y = if, and only if, X n Y = 0. 

For every neNwe choose functions /„, g n : 2't 1 '- ■ >"} — > Z 1 * via 

/„(X) := (rcls) (roll) J Do C (A x ) (/rell) 
3 „(Y) := (rcl2) Doc(B Y ) (/rcl2) (/rels) . 

Then, for all X, Y C {1, . . , n}, the string f n (X)g n (Y) = Doc(T(A x , By)) has 
length 0(n • logn), and 

f n (X)g n (Y) e £>oc(T Bropt2/Joi „) ^ X n Y = 0. 

From Lemma 4.1 we obtain for arbitrary r' ,s' : N — > N with r'(n • logn) • s'(n ■ 
logn) € o(n) that there is no (r', s')-bounded Turing machine which accepts 
exactly those strings of the form f n (X)g n (Y) where X C\Y = 0. Noting that 
size{T{A Xl B Y )) G 

0(|£> OC (T(A x ,S y ))| = 0(|/ n (X) 5 „(Y)|), 
one then obtains for arbitrary r, s : Trees T — > N with r(T) • s(T) G ( i O g(size(T)) ) 

that TjsionEmpty Join ST(r,s). □ 

H Proof of Theorem 4.7 

Proof Sketch of Theorem 4.1: 

We want to join two relations A and B by their common column(s) K. Let n 
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be the combined sizes of the binary representations of A and B. Assume we 
can sort the tuples of a relation by a number of key columns using an ST^(r, s) 
algorithm. 

Let us first briefly consider how a merge-join on sorted relations works (cf. e.g. 
[27]). Observe that on sorted relations A and B - and sort we can - a merge-join 
can compute a natural join A cx B using only buffer space for one tuple if there 
is a one-to-many relationship between A and B; say for each tuple t <G A, there 
is at most one tuple t? £ B such that t.K — t'.K. The normal mode of operation 
for a merge-join is to have the two sorted relations on separate external memory 
tapes. But this is not necessary; we can simply add another column Relld to both 
relations in which we store the name of the relation ("A" or "B") for each tuple, 
with "B" < "A" to get the single £?-tuple before the possibly multiple matching 
A-tuples. Then we sort the union of the two modified relations on (K, Relld) 
(with K the more significant column) . Now we can compute A co B in a single 
forward scan of the sort, buffering a B-tuple whenever it is encountered (in a 
single slot, always replacing a previously inserted 5-tuple) and comparing the 
£>-tuplc in the buffer with the A-tuples that follow. 
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Fig. 3. Example run of the join algorithm. 



In general, there is no one-to-many relationship between A and B, but we can 
use the following trick. Given an ST^(r,s) machine M for sorting, we modify 
M as follows: 

— We add three logn size registers to the internal memory tape, which are 
called sizeB, currentBidx, and iterator A. Moreover we need a register tup 
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for storing one tuple from either A or B (plus, strictly speaking, a log-sized 
register for storing the Ib column that we will add to each B-tuple). 

— Before we start with the usual operation of M, we make a first forward run 
over the external memory tape to count the tuples in B and store \B\ in 
internal memory tape register sizeB. When this is done, we reverse and 
move back to the start of the external memory tape. (This accounts for two 
reversals of the read head on the external tape.) 

— Then, we start the usual operation of M. However, whenever we see an A 
tuple (K : k, V : v) (which we may store on the internal memory tape) 
on the external memory tape, we simulate the reading of the \B\ tuples 
(K :k,V A : x,I B ■ 1, Relld: "A"),..., (A" : k,V A : x,I B ■ \B\,RelId : "A"). 
(We may use the iterator A internal memory tape register to count up to 
sizeB and copy an A-tuple into tup when we read it in order not to require 
further scans when we want to hand it over to M multiple times.) That is, 
when M asks to move the read head on to an A-tuple from the left, we show 
it the tuple with Ib — 1; when it asks to move on to the right, we show it 
the same tuple again with Ib = 2, and so on, until Is = \B\. If M asks to 
move on to the right, we really move on to the next tuple to the right. When 
we move to the left, we proceed analogously, but provide decreasing indexes 
Ib, starting with \B\. 

Whenever we see a B tuple (K : k, Vb ■ y) on the external memory tape, we 
simulate the reading of a tuple (K : k,V B : y, Is ■ i, Relld : "B"), where i is 
the current value in the currentBidx register. 

Note that in order to simulate such a much larger tape requires the external 

tape to be read-only. But this is assured as M is an S'T^-machine. 

Since we simulate a relation of size not greater than n 2 , M is in ST~ (r(n 2 ), s(n 2 )). 

— We sort on key (K, Ib, Relld) (with decreasing significance from K to Relld) 
of the A and B tuples, with sort order "A" < "B" on the relation names in 
the Relld column. 

Now observe that the simulated tuples define relations A' , B' such that, in 
relational algebra, 

^K,v A y B {^K,v A y B j B A' ix itk,v a ,v b ,i b B') = A ixi B, 

but now there is a guaranteed many-to-one relation between A' and B' . An 
example of our construction is provided in Figure 3. 

— While the tuples are produced in ascending sorted order by M, rather than 
writing them to the output immediately, we proceed as follows. Whenever 
we see a S'-tuple, we copy it into our tup register on the internal memory 
tape. Whenever we see an A'-tuple t (and a JS'-tuple has been seen before), 
we check whether t.K = tup.K and t.Is = tup.I B . If there is a match, we 
produce a tuple (K, Va,Vb) and write it to the output. 

It is easy to verify that this construction provides us with an ST~ (r(n 2 ) + 
2, s(n 2 ) + 0(logn + ma,x t ^AuB \t\)) machine for joining two relations. □ 

I Proof of Corollary 4.9 

Proof Sketch of Corollary 4-9: 
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ad (a): By contradiction. Suppose there is an (r, s)-boundcd Turing machine 
which computes KeySort (i.e., writes the sorted version of the input to its 
write-only output tape), but which does not write to its external tape. 

Then, by Theorem 4.7, we can compute the join of two input relations A, B 

in 

ST (r(n 2 ) + 2, s{n 2 ) + 0(logn) + O^rnax^ |t|)). (1) 

Now consider the problem JoiNi og , which is defined as the restriction of the 
natural join problem to input relations A, B where the size \t\ of (the binary 
representation of) each tuple t e A U B is at most logarithmic in the size of the 
binary representation of the input relations A,B. From (1) we then know that 
this problem JoiNi og must belong to 

ST(r(n 2 ) + 2,s(n 2 )+0(logn)). (2) 

Let r',s' : N -> N with r'{n) := r(n 2 ) + 2 and s'(n) e s(n 2 ) + O(logn); thus 
JoiN log e ST(r\s'). 

From the corollary's assumption on the asymptotic size of r, s we know that 
r'(n)-s'(n)eo(^). 

However, a variation of the proof of Lemma 4.4 shows that 

JoiN log g ST(r',s'), (3) 

if r' , s' : N — > N with r'(n) ■ s'(n) S ■ (To prove this, one can use a variant 

of the disjointness problem Disj n (- 7 •) where it is known that each of the given 
sets X, Y has at least j elements - the communication complexity for deciding 
whether X and Y are disjoint then is ^ j.) 
This completes the proof of (a). 

ad (b): This is a direct consequence of (a), because 

o((n 2 )*).(0((n 2 )*)+0(logn)) C o(ni)-0(n§) C o(J) C o(^) 

□ 



J Proof of Theorem 4.10 

Duris, Galil, and Schnitger [10] prove an exponential gap between k- and k+1- 
round communication complexity. They consider functions / : {0, . . , 2 m — 1} — ► 
{0, . . , 2 m — 1}, encoded as list of binary representations of the values /(0), /(l), 
. . . , /(2 m — 1), and prove a lower bound on the fc-round communication complex- 
ity of the language Lfe+i, consisting of the encodings of functions / where 

/(/(■■ ■/(/(<))) ■■■)) - 2 m -l. 
s .. ' 



The precise definition of Lk+i is as follows: 
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Definition J.l. For every k € N, let L k+ \ := 

{ w wi ■ ■ ■W2m-\ ■ m£ff, tiijG {0, l} m , and ex. j u . . ,j k +i 

such that w = ji,w ji = w Jk+1 = 2 m -l } . □ 

Theorem J. 2 (Duris, Galil, Schnitger [10]). For every k ^ 1, the following 
is true for all sufficiently large n G N: 

comm-compl k+l (Fk+i,n) ^ (fc + 1) ■ logn, 
comm-compl k (F k+hn ) > - fc4 ^ Qgn)3 , 
w/iere i/ie function Fk+i >n '■ {0, 1}™ x {0, 1}™ — > {0, 1} is given via 



F k+1 , n (x,y) : = |q 



1 , if e Lfe+i 
otherwise. 



In fact, Duris et al. [10] prove an even stronger result, namely that their 
lower bound applies for all fc-round protocols, even if communication complexity 
is measured as the minimum complexity over all arbitrary partitions of the input 
bits into two parts of equal size. 

Proof of Theorem 4-10: 

We use Theorem J. 2 and let L' k+1 :— 

{ r n #w a ---w 2 ™- 1 : me N, w t e {0, 1}"\ w Q ---w 2 m-i e L k+l }, 

where L k +i is the language fixed in Definition J.l. 

From the definition of L k +i it is straightforward to see that L' k+1 belongs to 
ST(k+l, 0((logfc) + (logn))) - the Turing machine just has to store the current 
index i G {0, . . , fc+1} and the corresponding string uij i on its internal tapes and 
move the external tape head to the block of index := u>j i . To recognize 
L' k+1 , this requires at most k changes of the direction of the external tape head 
and internal space 0((logfc) + (logn)). 

A nondeterministic Turing machine with internal space £2{k ■ log n) does not 
even need a single reversal of the external tape head - it can simply guess the 
strings Wj 1 , . . , Wj k+1 on one of its internal tapes and verify their "correctness" 
while scanning the external tape from left to right. 

Assume, for the sake of contradiction, that L' k+1 e ST(k, o( fc5 (^ n ) 3 )) via 
a Turing machine M that is (k, s)-boundcd, for some function s : N — > N with 
s(n) <G °( fc5 (l^n)^ )- Then, in the same way as in the proof of Lemma 4.1, M 
leads to a /c-round protocol V n , for all neN, that computes the function F k+ i, n 
from Theorem J. 2 and has cost at most d ■ k ■ s(n), for a suitable constant d 
(depending on M, but not on k or n). Since s(n) e °( k''(io^n) :i ) ^ we can nnc ^ 
sufficiently large n such that d ■ s(n) < 36k 5^ gn ^3 ■ Consequently, for such n we 
have comm-compl k {Fk+\ y n) ^ d-k-s(n) < 36fc4 ^ n ^ 3 , contradicting Theorem J. 2. 
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K Proof of Lemma 5.1 

Proof of Lemma 5.1: 

ad (a): We can choose Q :— 

/descendant : : * [child : : right /child : : right / child : : 1 ] /child :: left /child : : 1 

which selects all nodes x that are labeled 1 and for which there exists a node z 
such that 

(i) there exists a child z 1 of z which is labeled "left" such that x is a child of 

z', 

(ii) there exists a child z" of z which is labeled "right" and has a child z'" 
labeled "right" that has a child labeled 1. 

It is straightforward to check that for all T(X, Y) £ Ts e ts we have that Q(T(X, Y)) 
consists of exactly those nodes Xi for which both, xi and yi are labeled 1. I.e., 
Q(T(X,Y)) = { Xi :i£XnY}. 

ad (b): The above query Q can be translated in a straightforward way into an 
F0- formula ^p{x). The desired FO-sentence <p is chosen as (p := x A 3x tp(x), 
where x is a suitable F O-sentence expressing that the underlying tree has the 
correct shape. □ 



L Proof of Lemma 5.2 

Proof of Lemma 5.2: 

We use Lemma 4.1. For every n € N let p n denote the position in the string 
Doc(T n ) that carries the unique leaf of T„ carrying the label "left". 

For every n £ N we choose functions /„, g n : 2^'- ■ >"} — > S* oi as follows: For 
every X C {1, . . , n} let f n {X) be the prefix of Doc(T n (X, Y)) up to position p n , 
and let g n (Y) be the suffix of Doc(T n (X, Y)) starting at position p n + l. Then, 
the string f n (X)g n (Y) = Doc(T n (X, Y)) has length 10n+l 11 • n, and 

f n (X)g n (Y) £ Doc(T Dls] ) X n Y = 0. 

From Lemma 4.1 we obtain for arbitrary r' ,s' : N — > N with r'(n) ■ s'(n) £ o{n) 
that there is no (/, s')-bounded Turing machine which accepts exactly those 
strings of the form f n (X)g n (Y) where X n Y = 0. 

Noting that depth(T n {X,Y)) - 2n+2 £ 0(\Doc{T n {X,Y))\) - 0{\f n {X)g n (Y)\) 
one then obtains for arbitrary r,s : Trees T — > N with r(T) ■ s{T) £ o(depth(T)) 
that T NonDlsj ST(r, s). □ 

M Proof of Theorem 5.4 

Proof Sketch of Theorem 5.4-' 
We proceed in two steps. 
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Fig. 4. Unranked ordered tree (a) and corresponding binary (first-child, next-sibling)- 
tree (b) with traversal order for the evaluation of a bottom-up automaton. 



Step U We first show that T e ST(1, depth(-)+l) . 

Let B be a bottom-up binary tree automaton which accepts exactly the binary 
trees BinTree(T) for T £ T. For simplicity, we assume a single transition function 
S B : E x (Q U {J.}) x (Q U {J.}) -> Q. 

We may assume that the input XML document Doc(T) consists of a well- 
formed sequence of opening (a) and closing tags (/a), for tag symbols a € r. 5 We 
evaluate B as follows, using a stack of states of the automaton B. First we scan 
the input to the end. Then we reverse and scan it backwards. While scanning 
backwards, we do the following for each symbol s seen: 

if s is a closing tag then 
begin 

if there was no previous symbol or 

5 Non well-formed input can easily be detected by putting opening tags on the stack 
that we maintain in the algorithm below. 
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it was a closing tag then 
push(_L); 

end 

else if s is an opening tag (a) then 
begin 

if the previous symbol was a closing tag then 

qi ■= -L; 

else 

q 1 := pop(); 

q 2 := pop(); 
q:= S* 3 (a,ql 7 q2); 
push(g); 
end 

Consider the example run of Figure 4. Just after having processed the opening 
tag of node V3, the stack contains the symbols _L, p B (vx) 1 p B {v2) 1 p B (v3) (where 
the final symbol is the top of the stack, and p B (v) denotes the state assigned to 
node v by the run p of the tree automaton B) . 

It is easy to verify that whenever we are at a node v at depth d in the unranked 
tree T (i.e., between the opening and the closing tag of v, and not between the 
opening and closing tag of a descendant of v), there are d+ 1 items on the stack. 
Thus the depth of the stack never exceeds depth(T) + 1. Since every stack entry 
consist of a single symbol, the space consumption of the internal memory tape 
is bounded by depth(T) + 1. 

On termination of this ST(l, 0(depth(-))) algorithm, the stack will contain 
precisely one symbol. 

It is not difficult to verify that B accepts BinTree(T) if and only if after 
processing the final (and thus leftmost) symbol of the input, the top of the stack 
holds a final state of B. 

Step 2: From ST(l, depth{-) + \) to ST(0, depth{-)+\). 

Just as a binary bottom-up tree automaton on the (first-child, next-sibling) 
representation of (unranked) r-trees can be computed, so can a binary tree 
automaton B be computed that works on a (last-child, next-sibling) binary tree 
representation. 

We can evaluate B in one single forward scan of the input by taking the algo- 
rithm of Step 1, and exchanging every occurrence of "opening tag" by "closing 
tag" and vice-versa. Now we need only one forward scan to check whether B 
accepts. 

Altogether, the proof of Theorem 5.4 is complete. □ 

N Proof of Theorem 5.6 

Proof Sketch of Theorem 5. 6: 

(a) In [20], a technique for evaluating unary MSO queries in two scans of the 
data is described. The first scan is a backward bottom-up tree automaton scan 
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that writes the states computed for the nodes visited to an output tape that 
the second scan, a forward scan during which a top-down deterministic tree 
automaton is evaluated, reads. 

Let A and Bbea pair of a bottom-up and a top-down automaton for evalu- 
ating XPath expression ir in this way. 

After scanning to the end of the input, we perform a backward scan during 
which we compute the run of A as described in the first part of the proof of 
Proposition 5.4. Here we always replace the opening tag (a) of node v on the 
tape by a symbol (a q = p A {v)). (This is again a single tape symbol as both 
S and the state set Q A are fixed.) At the end of this run, we have, for each 
node v, the state p A (v) computed by the run of A attached to it. Note that in 
the algorithm of the proof of Proposition 5.4, p A (v) always gets available when 
the head on the external memory tape is on the position of the opening tag of 
node v, so we need no further buffer space besides the space occupied for the 
stack. Then we perform a third scan, a forward scan during which we compute 
the run of B. B is a deterministic top-down tree automaton and the state of a 
node depends only on its label and the state of its parent. As B runs on the 
(first-child, next-sibling) presentation of unranked trees, we have always p B (v) 
available as soon as we have read the opening tag of node v. According to the 
construction of [20], the state p B (v) indicates whether v is in the query result. 
We maintain a counter (initialized with 0) and during the scan, whenever we see 
an opening tag we increment it by one. Thus, whenever we decide that a node 
is part of the output, we write the current value of the counter - which is the 
index of the node in document order - to the output tape. This gives us the 
nodes matching the query in ascending order. 

(b) Using the same ideas as in the proof of Theorem 5.4 (changing the au- 
tomata from running on (first-child, next-sibling) to (last-child, ncxtsibling)- 
trees), we can compute the indexes of nodes matching a unary MSO query in 
reverse order (i.e., we output the node indexes while traversing the data back- 
wards). □ 

Remark N.l. The proof of Theorem 5.6 requires (i) to scan the external mem- 
ory tape both forward and backward, and (ii) to store states of the bottom-up 
automaton used in the proof construction of Theorem 5.4 on the external tape. 
If the query is considered fixed (data complexity), states are constant-size and 
can replace symbols of the input; but this means that we need to allocate space 
enough to store a state into each tape position of the input. The results of [20] 
only readily yield automata A whose state space is of size doubly exponential in 
the size of the given query (in the query language of the framework, monadic dat- 
alog) . If we want to use this technique, we need Turing machines whose external 
tape alphabet is of size doubly exponential in the size of the given query. 



